通过Linux capabilities机制让kong监听80和443端口

前情提要

在做kong容器实验的时候,发现kong数据平面默认监听的端口是8000和8443,非常难受。

通过容器启动时声明环境变量的方式,将数据平面监听端口改成了80和443,但是启动时提示[nginx: [emerg] bind() to 0.0.0.0:80 failed (13: permission denied)]

Kong API网关搭建部署记录通过更改配置的方式,可以实现让数据平面监听在80和443,这就有点奇怪了。

于是乎去看了下kong社区是怎么生成容器镜像的,项目地址在这里

Dockerfile

以alpine为例,找到项目里面的alpine/Dockerfile文件,内容如下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
FROM alpine:3.10
LABEL maintainer="Kong Core Team <team-core@konghq.com>"

ENV KONG_VERSION 1.2.1
ENV KONG_SHA256 067bed966de064f15e548b1afbf859e724a3a5689865edc501db40cf61a7044c

RUN adduser -Su 1337 kong \
&& mkdir -p "/usr/local/kong" \
&& apk add --no-cache --virtual .build-deps wget tar ca-certificates \
&& apk add --no-cache libgcc openssl pcre perl tzdata curl libcap su-exec zip \
&& wget -O kong.tar.gz "https://bintray.com/kong/kong-alpine-tar/download_file?file_path=kong-$KONG_VERSION.apk.tar.gz" \
&& echo "$KONG_SHA256 *kong.tar.gz" | sha256sum -c - \
&& tar -xzf kong.tar.gz -C /tmp \
&& rm -f kong.tar.gz \
&& cp -R /tmp/usr / \
&& rm -rf /tmp/usr \
&& cp -R /tmp/etc / \
&& rm -rf /tmp/etc \
&& apk del .build-deps \
&& chown -R kong:0 /usr/local/kong \
&& chmod -R g=u /usr/local/kong

COPY docker-entrypoint.sh /docker-entrypoint.sh

ENTRYPOINT ["/docker-entrypoint.sh"]

EXPOSE 8000 8443 8001 8444

STOPSIGNAL SIGQUIT

CMD ["kong", "docker-start"]

可以看到,Kong容器默认运行指令为/docker-entrypoint.sh kong docker-start

Entrypoint

那么来看看docker-entrypoint.sh里面做了什么事情吧。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
#!/bin/sh
set -e

export KONG_NGINX_DAEMON=off

has_transparent() {
echo "$1" | grep -E "[^\s,]+\s+transparent\b" >/dev/null
}

if [[ "$1" == "kong" ]]; then
PREFIX=${KONG_PREFIX:=/usr/local/kong}

if [[ "$2" == "docker-start" ]]; then
shift 2
kong prepare -p "$PREFIX" "$@"

# workaround for https://github.com/moby/moby/issues/31243
chmod o+w /proc/self/fd/1 || true
chmod o+w /proc/self/fd/2 || true

if [ "$(id -u)" != "0" ]; then
exec /usr/local/openresty/nginx/sbin/nginx \
-p "$PREFIX" \
-c nginx.conf
else
if [ ! -z ${SET_CAP_NET_RAW} ] \
|| has_transparent "$KONG_STREAM_LISTEN" \
|| has_transparent "$KONG_PROXY_LISTEN" \
|| has_transparent "$KONG_ADMIN_LISTEN";
then
setcap cap_net_raw=+ep /usr/local/openresty/nginx/sbin/nginx
fi
chown -R kong:0 /usr/local/kong
exec su-exec kong /usr/local/openresty/nginx/sbin/nginx \
-p "$PREFIX" \
-c nginx.conf
fi
fi
fi

exec "$@"

从脚本里面可以看到做了以下几个事情

  • 判断参数为docker-start时,执行kong prepare
  • 判断环境变量是否带有transparent,有的话就执行setcap cap_net_raw=+ep,没有就直接结束判断
  • 修改/usr/local/kong的owner和group
  • 最终是使用kong用户去启动Nginx进程。
1
2
3
4
5
6
7
8
9
/ # ps -ef
PID USER TIME COMMAND
1 kong 0:00 nginx: master process /usr/local/openresty/nginx/sbin/nginx -p /usr/local/kong -c nginx.conf
32 kong 0:00 nginx: worker process
33 kong 0:00 nginx: worker process
34 kong 0:00 nginx: worker process
35 kong 0:00 nginx: worker process
36 root 0:00 /bin/sh
45 root 0:00 ps -ef

从容器里面可以看到,Nginx进程是以kong用户运行的,自然就没有权限监听小于1024的端口,那么transparent怎么就可以了呢,回到if判断里面执行了setcap cap_net_raw=+ep这样的命令。

这个cap_net_raw是干什么用的,查了下文档,针对此项的说明如下

1
2
3
4
5
CAP_NET_RAW
*
use RAW and PACKET sockets;
*
bind to any address for transparent proxying.

也就是,根据脚本判断到kong需要实现transparent功能时,会给nginx添加CAP_NET_RAW,以实现普通用户也能运行需要trasparent proxying权限的程序。

思考

那么问题来了,在Linux capabilities里面,是否有相关的权限可以让程序绑定小于1024的端口呢。

恩,很好,一下就找到了CAP_NET_BIND_SERVICE,说明如下

1
2
CAP_NET_BIND_SERVICE
Bind a socket to Internet domain privileged ports (port numbers less than 1024).

魔改

接下来就是魔改entrypoint的节奏了,修改如下

在34行给nginx进程添加了CAP_NET_BIND_SERVICE权限

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
#!/bin/sh
set -e

export KONG_NGINX_DAEMON=off

has_transparent() {
echo "$1" | grep -E "[^\s,]+\s+transparent\b" >/dev/null
}

if [[ "$1" == "kong" ]]; then
PREFIX=${KONG_PREFIX:=/usr/local/kong}

if [[ "$2" == "docker-start" ]]; then
shift 2
kong prepare -p "$PREFIX" "$@"

# workaround for https://github.com/moby/moby/issues/31243
chmod o+w /proc/self/fd/1 || true
chmod o+w /proc/self/fd/2 || true

if [ "$(id -u)" != "0" ]; then
exec /usr/local/openresty/nginx/sbin/nginx \
-p "$PREFIX" \
-c nginx.conf
else
if [ ! -z ${SET_CAP_NET_RAW} ] \
|| has_transparent "$KONG_STREAM_LISTEN" \
|| has_transparent "$KONG_PROXY_LISTEN" \
|| has_transparent "$KONG_ADMIN_LISTEN";
then
setcap cap_net_raw=+ep /usr/local/openresty/nginx/sbin/nginx
fi
chown -R kong:0 /usr/local/kong
setcap cap_net_bind_service=+ep /usr/local/openresty/nginx/sbin/nginx
exec su-exec kong /usr/local/openresty/nginx/sbin/nginx \
-p "$PREFIX" \
-c nginx.conf
fi
fi
fi

exec "$@"

重新构建容器镜像了

1
docker build -t newkong:v1 .

成果检验

重新运行容器

1
2
3
4
5
6
7
8
9
10
docker run -d \
-e KONG_DATABASE=off \
-e KONG_PROXY_LISTEN='0.0.0.0:80, 0.0.0.0:443 ssl' \
-e KONG_LOG_LEVEL=notice \
-e "KONG_PROXY_ACCESS_LOG=/dev/stdout" \
-e "KONG_ADMIN_ACCESS_LOG=/dev/stdout" \
-e "KONG_PROXY_ERROR_LOG=/dev/stderr" \
-e "KONG_ADMIN_ERROR_LOG=/dev/stderr" \
--name kong \
newkong:v1

查看日志

1
2
3
4
5
6
7
8
docker logs kong
2019/07/05 03:35:41 [notice] 1#0: using the "epoll" event method
2019/07/05 03:35:41 [notice] 1#0: openresty/1.13.6.2
2019/07/05 03:35:41 [notice] 1#0: built by gcc 8.3.0 (Alpine 8.3.0)
2019/07/05 03:35:41 [notice] 1#0: OS: Linux 5.1.15-300.fc30.x86_64
2019/07/05 03:35:41 [notice] 1#0: getrlimit(RLIMIT_NOFILE): 1048576:1048576
2019/07/05 03:35:41 [notice] 1#0: start worker processes
2019/07/05 03:35:41 [notice] 1#0: start worker process 33

查看进程

1
2
3
4
5
docker exec -it kong /bin/sh -c "ps -ef"
PID USER TIME COMMAND
1 kong 0:00 nginx: master process /usr/local/openresty/nginx/sbin/ngin
33 kong 0:00 nginx: worker process
39 root 0:00 ps -ef

查看监听

1
2
3
4
5
6
7
8
9
docker exec -it kong /bin/sh -c "netstat -lp"
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 0.0.0.0:443 0.0.0.0:* LISTEN -
tcp 0 0 127.0.0.1:8444 0.0.0.0:* LISTEN -
tcp 0 0 127.0.0.1:8001 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:80 0.0.0.0:* LISTEN -
Active UNIX domain sockets (only servers)
Proto RefCnt Flags Type State I-Node PID/Program name Path

可以看到,Nginx已经可以成功监听80和443端口了。欧耶~!

后记

最后琢磨了一下,通过传入启动命令的方式也可以实现kong监听80和443端口。

感觉绕了弯子……

1
2
3
4
5
6
7
8
9
10
11
docker run -d \
-e KONG_DATABASE=off \
-e KONG_PROXY_LISTEN='0.0.0.0:80, 0.0.0.0:443 ssl' \
-e KONG_LOG_LEVEL=notice \
-e "KONG_PROXY_ACCESS_LOG=/dev/stdout" \
-e "KONG_ADMIN_ACCESS_LOG=/dev/stdout" \
-e "KONG_PROXY_ERROR_LOG=/dev/stderr" \
-e "KONG_ADMIN_ERROR_LOG=/dev/stderr" \
--name kong \
kong:1.2 \
kong start